Weighted feature significance: a simple, interpretable model of compound toxicity based on the statistical enrichment of structural features.
نویسندگان
چکیده
In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high-throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation.
منابع مشابه
Feature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm
This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the measure...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملRadiomics modelling of IMRT induced acute rectal toxicity using clinical and magnetic resonance imaging features
Introduction: Rectal toxicity is a dose limiting issue in prostate cancer radiotherapy. Prediction of these effects may be used to tailor the therapy. The purpose of this work was to develop predictive radiomic models based on clinical, dosimetric and radiomic features extracted from rectal wall magnetic resonance image (MRI). Materials and Methods: This st...
متن کاملThe Acquisition of Definiteness Feature by Persian L2 Learners of English
The definiteness feature in English is both LF and PF interpretable while Persian is a language in which this feature is LF-interpretable but PF-uninterpretable. Hence, there is no overt article or morphological inflection in Persian denoting a definite context. Furthermore, Persian partially encodes specificity not definiteness. In definiteness both the speaker and hearer are involved while in...
متن کاملAnalysis and Synthesis of Facial Expressions by Feature-Points Tracking and Deformable Model
Face expression recognition is useful for designing new interactive devices offering the possibility of new ways for human to interact with computer systems. In this paper we develop a facial expressions analysis and synthesis system. The analysis part of the system is based on the facial features extracted from facial feature points (FFP) in frontal image sequences. Selected facial feature poi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Toxicological sciences : an official journal of the Society of Toxicology
دوره 112 2 شماره
صفحات -
تاریخ انتشار 2009